Skip to Main Content
AWS Blogs
Amazon Big Data Blog
Utilizing DeepSeek with Amazon OpenSearch Service Vector Database and Amazon SageMaker
by Marcus Bennett and Lila Chang on 07 FEB 2025
Category: Amazon OpenSearch Service, Amazon SageMaker, Technical How-to
OpenSearch Service offers extensive capabilities for RAG applications, in addition to vector embedding-driven semantic search. By leveraging the adaptable connector framework and search flow pipelines within OpenSearch, users can interface with models provided by DeepSeek, Cohere, and OpenAI, as well as those hosted on Amazon Bedrock and SageMaker. In this article, we illustrate how to establish a connection to DeepSeek’s text generation model, enabling a RAG workflow that generates text responses to user inquiries. For more insights, check out this blog post.
How EUROGATE Built a Data Mesh Architecture Using Amazon DataZone
by Dr. Clara Jensen, Marco Albright, Leila Chen, and Simon Patel on 15 JAN 2025
Category: Amazon DataZone, Amazon Redshift, Amazon SageMaker, Analytics, Customer Solutions
In this article, we explore how EUROGATE utilizes AWS services, including Amazon DataZone, to enhance data discoverability for stakeholders across various business units, thereby accelerating innovation. Two use cases highlight applications in business intelligence (BI) and data science, demonstrating the effectiveness of AWS services such as Amazon Redshift and Amazon SageMaker.
Announcing a New Unified Data Connection Experience with Amazon SageMaker Lakehouse
by Kenji Watanabe, Priya Nair, Fiona Zhao, and Jake Thompson on 16 DEC 2024
Category: Amazon Athena, Amazon SageMaker, Analytics, AWS Glue
Amazon SageMaker Lakehouse now offers unified data connectivity, allowing users to effortlessly connect, explore, and maximize the potential of their data across AWS services, facilitating agile business objectives. This post illustrates how SageMaker Lakehouse’s unified data connectivity streamlines data integration tasks by simplifying the setup and management of connections to various data sources.
An Integrated Experience for All Your Data and AI with Amazon SageMaker Unified Studio
by Andrew Carter, Maya Lin, Zachary Fox, and Kenji Watanabe on 11 DEC 2024
Category: Amazon SageMaker, Analytics, Launch
Amazon SageMaker Unified Studio serves as a comprehensive integrated development environment (IDE) for data, analytics, and AI. Discover and utilize your data with familiar AWS tools to complete end-to-end development workflows, including data analysis, processing, model training, and generative AI application development, all within a single governed environment. This post highlights how SageMaker Unified Studio consolidates your analytic workflows.
Streamlining Data Access for Your Enterprise with Amazon SageMaker Lakehouse
by Alisha Parker, Rohan Shah, and Geeta Mehta on 04 DEC 2024
Category: Amazon Redshift, Amazon SageMaker, Analytics, AWS Glue, AWS Lake Formation
Amazon SageMaker Lakehouse provides a cohesive solution for enterprise data access, merging information from both warehouses and lakes. This post demonstrates how SageMaker Lakehouse integrates disparate data sources, enabling secure access across the enterprise, and allowing teams to utilize their preferred tools for predicting and analyzing customer churn. The solution encompasses various data sources, including Amazon S3, Amazon Redshift, and AWS Glue Data Catalog, with AWS Lake Formation overseeing permissions.
Visual ETL Flows on Amazon SageMaker Unified Studio
by Priyanka Roy, Alex Tello, Gabi Heyne, Ranu Shah, and Kenji Watanabe on 04 DEC 2024
Category: Amazon SageMaker, Analytics, AWS Glue
Amazon SageMaker Unified Studio (preview) offers an integrated environment for data and AI development within Amazon SageMaker. This post demonstrates how to create low-code and no-code (LCNC) visual ETL flows for seamless data ingestion and transformation across multiple data sources.
Streamlining Data Integration with AWS Glue and Zero-ETL to Amazon SageMaker Lakehouse
by Shovan Kanjilal, Kamen Sharlandjiev, Kartikay Khator, Caio Montovani, and Vivek Pinyani on 04 DEC 2024
Category: Amazon SageMaker, Analytics, Announcements, AWS Glue
AWS has introduced zero-ETL integration support for external applications to AWS Glue, enhancing data integration for organizations. This new capability allows for the effortless replication of data from popular platforms like Salesforce, ServiceNow, and Zendesk into Amazon SageMaker Lakehouse and Amazon Redshift. This blog post features a case study on integrating ServiceNow data, detailing the steps for setting up a connector, creating a zero-ETL integration, and verifying initial data loads and change data capture (CDC). It also emphasizes the benefits of using Apache Iceberg for data versioning and time travel features within zero-ETL integrations.
Cataloging and Governing Amazon Athena Federated Queries with Amazon SageMaker Lakehouse
by Samuel Adwankar, Stuti Deshpande, Priyanka Roy, Scott Rigney, and Kenji Watanabe on 04 DEC 2024
Category: Amazon Athena, Amazon SageMaker, Analytics
This article explains how to connect, govern, and execute federated queries on data stored in Redshift, DynamoDB (Preview), and Snowflake (Preview). We utilize Athena, which integrates seamlessly with SageMaker Unified Studio. By using SageMaker Lakehouse, we present data to users as federated catalogs, a new catalog object concept. Lastly, we illustrate how to implement column-level security permissions in AWS Lake Formation to grant analysts access to the necessary data while restricting sensitive information. For further information, you may want to visit this excellent resource.
Leave a Reply